6.046J Lecture 10: Hashing and amortization
ثبت نشده
چکیده
10.1 Arrays and Hashing Arrays are very useful. The items in an array are statically addressed, so that inserting, deleting, and looking up an element each take O(1) time. Thus, arrays are a terrific way to encode functions { 1, . . . ,n }→ T, where T is some range of values and n is known ahead of time. For example, taking T = {0,1}, we find that an array A of n bits is a great way to store a subset of {1, . . . ,n}: we set A[i] = 1 if and only if i is in the set (see Figure 10.1). Or, interpreting the bits as binary digits, we can use an n-bit array to store an integer between 0 and 2n−1. In this way, we will often identify the set {0,1}n with the set {0, . . . ,2n −1}. What if we wanted to encode subsets of an arbitrary domain U , rather than just {1, . . . ,n}? Or to put things differently, what if we wanted a keyed (or associative) array, where the keys could be arbitrary strings? While the workings of such data structures (such as dictionaries in Python) are abstracted away in many programming languages, there is usually an array-based solution working behind the scenes. Implementing associative arrays amounts to finding a way to turn a key into an array index. Thus, we are looking for a suitable function U → {1, . . . ,n}, called a hash function. Equipped with this function, we can perform key lookup: U hash function −−−−−−−−−→ {1, . . . ,n} array lookup −−−−−−−−−→ T (see Figure 10.2). This particular implementation of associative arrays is called a hash table. There is a problem, however. Typically, the domain U is much larger than {1, . . . ,n}. For any hash function h : U → {1, . . . ,n}, there is some i such that at least |U | n elements are mapped to i. The set
منابع مشابه
6.046J Lecture 4: Minimum spanning trees II
In the previous lecture, we outlined Kruskal’s algorithm for finding an MST in a connected, weighted undirected graph G = (V ,E,w): Initially, let T ←; be the empty graph on V . Examine the edges in E in increasing order of weight (break ties arbitrarily). • If an edge connects two unconnected components of T, then add the edge to T. • Else, discard the edge and continue. Terminate when there i...
متن کاملUsing a Queue to De-amortize Cuckoo Hashing in Hardware
Cuckoo hashing combines multiple-choice hashing with the power to move elements, providing hash tables with very high space utilization and low probability of overflow. However, inserting a new object into such a hash table can take substantial time, requiring many elements to be moved. While these events are rare and the amortized performance of these data structures is excellent, this shortco...
متن کاملDe-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results
Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whereas such an amortized guarantee may be suitable for some applications, in other applications (su...
متن کاملCuckoo Hashing for Undergraduates
This lecture note presents and analyses two simple hashing algorithms: “Hashing with Chaining”, and “Cuckoo Hashing”. The analysis uses only very basic (and intuitively understandable) concepts of probability theory, and is meant to be accessible even for undergraduates taking their first algorithms course.
متن کاملLecture 10 — March 20 , 2012
In the last lecture, we finished up talking about memory hierarchies and linked cache-oblivious data structures with geometric data structures. In this lecture we talk about different approaches to hashing. First, we talk about different hash functions and their properties, from basic universality to k-wise independence to a simple but effective hash function called simple tabulation. Then, we ...
متن کامل